Ejercicios!

Cargar datos “data/bank_2.csv”

library(tidyverse)
library(funModeling)
library(corrr)

data_bank=read_delim("../data/bank_2.csv", delim = ";")

1 - Encontrar las correlaciones de las var numéricas entre si.

cor_bank=data_bank %>% select_if(is.numeric) %>% correlate()
## 
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'
cor_bank
## # A tibble: 7 x 8
##   rowname       age  balance      day duration campaign    pdays previous
##   <chr>       <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
## 1 age      NA        0.00297 -0.00865   0.0168 -0.00206 -0.00440 -0.00230
## 2 balance   0.00297 NA        0.0105    0.0224 -0.0139   0.0174   0.0308 
## 3 day      -0.00865  0.0105  NA        -0.0185  0.137   -0.0772  -0.0590 
## 4 duration  0.0168   0.0224  -0.0185   NA      -0.0416  -0.0274  -0.0267 
## 5 campaign -0.00206 -0.0139   0.137    -0.0416 NA       -0.103   -0.0497 
## 6 pdays    -0.00440  0.0174  -0.0772   -0.0274 -0.103   NA        0.507  
## 7 previous -0.00230  0.0308  -0.0590   -0.0267 -0.0497   0.507   NA

2 - Encontrar todas las correlaciones lineales entre las variables de entrada y la salida

cor_bank_2=data_bank %>% select_if(is.numeric) %>% correlate() %>% stretch()
## 
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'
cor_bank_2
## # A tibble: 49 x 3
##    x       y               r
##    <chr>   <chr>       <dbl>
##  1 age     age      NA      
##  2 age     balance   0.00297
##  3 age     day      -0.00865
##  4 age     duration  0.0168 
##  5 age     campaign -0.00206
##  6 age     pdays    -0.00440
##  7 age     previous -0.00230
##  8 balance age       0.00297
##  9 balance balance  NA      
## 10 balance day       0.0105 
## # … with 39 more rows

3 - Encontrar las variables mas importantes según el gain ratio (puede tardar)

# truco para que ande mas rapido: tomar una muestra
data_bank_sample=sample_n(data_bank,1000)
res_rank_info_bank=var_rank_info(data = data_bank_sample, target = "deposit")
## Warning in KL.plugin(freqs2d, freqs.null, unit = unit): Vanishing value(s)
## in argument freqs2!

## Warning in KL.plugin(freqs2d, freqs.null, unit = unit): Vanishing value(s)
## in argument freqs2!
res_rank_info_bank
##          var    en    mi            ig           gr
## 1      pdays 3.411 0.195 0.19463516730 0.0747986747
## 2    contact 2.015 0.057 0.05688788373 0.0531448615
## 3   poutcome 2.159 0.051 0.05095001807 0.0421652953
## 4    housing 1.969 0.031 0.03093520064 0.0309441298
## 5   previous 2.513 0.043 0.04261050342 0.0274274906
## 6      month 4.027 0.079 0.07884737855 0.0253719285
## 7        age 6.295 0.082 0.08184351447 0.0152201445
## 8       loan 1.508 0.007 0.00719833180 0.0139823484
## 9   campaign 3.266 0.026 0.02597614846 0.0113154541
## 10       job 4.118 0.021 0.02130729548 0.0067876409
## 11       day 5.809 0.032 0.03158712250 0.0065254371
## 12 education 2.582 0.009 0.00919664433 0.0057739176
## 13   marital 2.343 0.007 0.00727194433 0.0053913876
## 14   default 1.087 0.000 0.00007547927 0.0008640825
## 15   balance 9.363 0.797            NA           NA
## 16  duration 9.296 0.698            NA           NA

Gráficos para modelos predictivos

Analizar la correlación entre las variables de entrada y la salida ‘deposit’. Si no le pasan ‘input’, entonces corre para todas las variables ;)

4 - Usar función cross_plot

cross_plot(data = data_bank, target = 'deposit')
## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.
## Plotting transformed variable 'age' with 'equal_freq', (too many values). Disable with 'auto_binning=FALSE'
## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.
## Plotting transformed variable 'balance' with 'equal_freq', (too many values). Disable with 'auto_binning=FALSE'

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.
## Plotting transformed variable 'day' with 'equal_freq', (too many values). Disable with 'auto_binning=FALSE'

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.
## Plotting transformed variable 'duration' with 'equal_freq', (too many values). Disable with 'auto_binning=FALSE'

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.
## Plotting transformed variable 'campaign' with 'equal_freq', (too many values). Disable with 'auto_binning=FALSE'

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.
## Plotting transformed variable 'pdays' with 'equal_freq', (too many values). Disable with 'auto_binning=FALSE'

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.
## Plotting transformed variable 'previous' with 'equal_freq', (too many values). Disable with 'auto_binning=FALSE'

## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.

5 - Usar función plotar (boxplot e histdens)

plotar(data = data_bank, target = 'deposit', plot_type = 'boxplot')
## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.

plotar(data = data_bank, target = 'deposit', plot_type = 'histdens')
## Warning in remove_na_target(data, target = target): There were removed 5
## rows with NA values in target variable 'deposit'.

6 - Usar función rplot() del paquete corrr (googlear) en el set de datos mtcars mtcars ya está cargado en el entorno de R como iris

mtcars %>% select_if(is.numeric) %>% correlate() %>% rplot()
## 
## Correlation method: 'pearson'
## Missing treated using: 'pairwise.complete.obs'